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Abstract 

Modeling of high order multivariate probability distribution is a difficult 
problem which occurs in many fields. Copula approach is a good choice for 
this purpose, but the curse of dimensionality still remains a problem. In this 
paper we give a theorem which expresses a multivariate copula by using only 
some lower dimensional ones based on the conditional independences between 
the variables. In general the construction of a multivariate copula using this 
theorem is quite difficult, due the consistency properties which have to be 
fulfilled. For this purpose we introduce the sample derivated copula, and 
prove that the dependence between the random variables involved depends 
just on this copula and on the partition. By using the sample derivated 
copula the theorem can be successfully applied, in order to to construct a 
multivariate discrete copula by using some of its marginals. 

Keywords: 
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1. Introduction 

First we motivate why should we model the multivariate distribution 
by copulas from an information theoretical point of view. The information 
content of a multivariate probability distribution depends only on its copula 
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density. In 12 and [lj one can see this result for the two-dimensional case 
and the same is true for more dimensions, too. 

In this paper we prove a theorem which links the multivariate probability 
distribution assigned to a junction tree to the multivariate copula. It is 
known that the probability distribution assigned to a junction tree uses the 
conditional independence structure underlying the random variables so the 
copula introduced here will have this property, too. 

In this introductory part we describe the main concepts and introduce 
the notations which we will use in the paper. In the second section we prove 
a theorem which links a multivariate copula to the junction tree probability 
distribution. In the third section we will introduce the concept of Sample 
Derivated Copula (SDC) which makes possible the exploitation of the con- 
ditional independences between the random variables. We prove that the 
information content of the probability distribution given by a partition set 
depends only on the SDC. In the fourth section we apply the junction tree 
approach to the SDC. 

We finish the paper with conclusions and possible applications. 

Let V — {1, . . . , n} be a set of vertices. A hypergraph is a set V of vertices 
together with a set T of subsets of V. A hypergraph is acyclic if no elements 
in T are subsets of other elements, and if the elements of T can be ordered 
(Ki, . . . , K m ) to have the running intersection property: for all j > 2, exists 
i<j:KiDKjn(KiU...U Kj_ x ) 0- 

It is convenient to introduce the so called separator sets Sj = Kj fl (Ki U 
. . . U Kj-x), where Si = (p. 

We note here that if Rj = Kj\Sj then Sj separates (in graph terms) the 
vertices in Rj from the vertices in {K\ U . . . U Kj-i)\Sj. 

We mention here that a hypergraph (V, T) is acyclic if and only if T can be 
considered to be the set of cliques of a chordal (triangulated) graph 16 



In the following we consider acyclic hypergraphs with the property that 
the union of all sets in T is V. We denote the separator set by S and refer 
to the acyclic hypergraph as (V,T,S). 

Let V = {1, 2, . . . , n} be the set of indices of the continuous random vari- 
ables X = {Xi, . . . , X n }. We suppose that the probability density functions 
of X\, . . . , X n exist and denote them by fx 1 , • • • , fx n - 

We need the following notations: 

• Fx t ( x i) — P{Xi < xi\Xj = oo for all j ^ i) stands for the univariate 
marginal cumulative distribution function corresponding to the variable 
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• The joint probability density function and the joint cumulative dis- 
tribution function of (Xl, . . . , X n ) T is denoted by /x(x) and -Fx(x), 
respectively, 

• D = {h, . . . ,i d } <ZV, X D = (X h , . . .,X id ) T , x D = (x h , . . .,x id ) T , 

• The d-th order marginal probability density function and the d-the 
order marginal cumulative distribution function of is denoted by 
/x d (xd) and F Xd (x d ), respectively. 

Having these notations we give the concept of the junction tree. It is 
known that the junction tree encodes the conditional independences between 
the variables. Let us remark here that from now on the indices of the random 
variables are assigned to the nodes of a graph. In the graph a set of nodes 
B separates a set of nodes A from another set of nodes C, where A, B, C are 
disjoint subsets of V, if and only if and Xq are conditionally independent 
with respect to Xb (see the definition of the Markov random field). 

Definition 1. A junction tree over X is a cluster tree, which is assigned to 
an acyclic hypergraph (V,T,S) as follows: 

1) Each cluster of the cluster tree consists of a subset Xk of X , where 
K e T. To each cluster is assigned the joint marginal density function 
/x K (xjf); 

2) Each edge connecting to clusters is called separator and consists of a 
subset X s of X, where S is a separator set. To each separator there is 
assigned the marginal probability density function /x s (xg); 

3) The union of all clusters is X. 

Definition 2. A junction tree probability distribution is a probability dis- 
tribution assigned to the junction tree in the following way: 



where vs is the number of those clusters which contain all the variables of 
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It is useful to note here that since in the hypergraph (V,T,S) Sj sepa- 
rates (in graph terms) the vertices in Rj = Kj — Sj from the vertices in 
{K\ U . . . U Kj-i) — Sj the random variables with indices in Rj = Kj — Sj 
and the variables with indices in {K\ U . . . U Kj-i) — Sj are conditionally 
independent with respect to the variables with indices in Sj. 

Remark 1. Since the junction tree is assigned to an acyclic hypergraph, 
the running intersection property stands for the junction tree, too. It can be 
reformulated as follows. If two clusters contain a random variable, then all 
clusters on the path between these clusters contain this random variable. 

First we call back the concept of copula and formulate the Sklar's theorem 
(see j^j and (lif). 

Definition 3. A function C : [0; l] d — > [0; 1] is called a d- dimensional copula 
if it satisfies the following conditions: 

1) C (ux, . . . , Ud) is increasing in each component u^, 

2) C (ux, . . . , Ui-i, 0, u i+ i, . . . , Ud) = for all Uk e [0; 1], k ^ i, i = 1, . . . , n, 

3) C (1, . . . , 1, Ui, 1, . . . , 1) = Ui for all G [0; 1] , i = 1, . . . ,d, 

4) C is (i-increasing, i.e for all (111,1, • • • > u i,d) an d (^2,1, • • • , u 2,d) m [0; l] d 
with U\ t i < u 2; i for all i, we have 

2 2 v i- 

^■..^(-l)i=' 3 C(v--,v)> ' 

il=l i d =l 

Due to Sklar's theorem if X%, . . . ,X^, are continuous random variables 
defined on a common probability space, with the univariate marginal cdf's 
F Xi (xi) and the joint cdf F Xlt ... t x d (%i, ■ ■ ■ ,Xd) then there exists a unique 
copula function Cx 1 ,...,x d ( u i, ■ ■ ■ , u d) '■ [0; 1] — > [0; 1] such that by the sub- 
stitution Ui = Fi (xi) , i — 1, . . . , d we get 

Fx u ...,x d (x 1 ,...,x d ) = C Xl ,...,x d (Fx (xx) ,...,F d (x d )) 

for all (xx, ■ ■ ■ , Xd) T € R d . 

In the following we will use the vectorial notation F Xd (x^) = Cx D (u D ), 

where u D = (F Xil M ,...,F Xid {x id )j ■ 
We need the following assertion: 
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L) • • • ) X i d j 



fx h ,...X id (Xh; 

d d F Xn: ,„ Xid (x h ,...,x id ) 



dxi-^ • • • dx{ d 

d d C Xii ^ Xld (f Xii {x n ) ,---,F Xid (x id/ 
Oxi 1 • • • dxi d 



d d C Xii ^ Xid (u h ,...,u id ) 



du h ■ ■ ■ du id 



ft d J\ 

k=l,...d 



= c Xil ,...x id (f Xh (x h ) ,...,F X (x id )) ■ n fx ik (x ik ) 

v ' k=l 

In vectorial notation this can be written as 

/x D (Xfl) = c Xd (u D ) • \\ f x . k (x ik ) 

i k eD 

and from (JT]) we get 

/x D (xi)) 



n f Xih (x 



i k eD 



Iky 



(1) 



(2) 



2. The multivariate copula associated to a junction tree probability 
distribution. 

Theorem 1. The copula density function associated to a junction tree prob- 
ability distribution 

EI /x* (xjf) 

/x(x) 



II [/X S (X5)] 
SG5 



v s -l ' 



is given fry 



E'er 

n [px s (u s )] 



u s -l 



(3) 



Proof. 



/x(x) 



n /x K (xa-) n c Xk {u K ) ■ n fx ik (^j 



v s -l 



n [/ Xs (x s )] 
5 e5 n 

ses 



cx s (u s ) ■ n fx ik (x ik ) 
i k es 



v.s-l ' 



(4) 



The question that we have to answer is how many times appears in the 
nominator respectively in the denominator the probability density function 
fxi (xi) of each X^ random variable. 

Since 1J = X for each random variable Xi eX, fx t {x-i) appears at 

least once in the nominator. 

Now we prove that in the junction tree over X the number of clusters 
which contain a variable Xi is greater with 1 than the number of separators 
which contain the same variable. This is true for all % = 1, . . . , n. This means 
# {K G T|Xj G X K } =#{5 G S\Xi G X s } + 1. 

For a variable Xi we denote #{S G S\Xi G X s } by t. 

Case: t = 0. 

The statement is a consequence of the definition of junction tree, that is 
the union of all clusters is X, so every variable have to appear at least in one 
cluster. Xi can not appear in two clusters, because in this case there should 
exist a separator which contain Xi too, and we supposed that there is not 
such a separator (t = 0) 

Case: t > 

If two clusters contain the variable X it then every cluster from the path 
between the two clusters contain Xi (running intersection property). From 
this results that the clusters containing Xi are the nodes of a connected 
graph, and this graph is a tree. If this tree contain t separator sets then it 
contains t + 1 clusters. All of these separators contain Xi, and each separator 
connects two clusters. So there will be t + 1 clusters that contain Xi. 

Applying this result in formula (jl]) after simplification we obtain 

n 

EI c Xjf (ujr) n fxi (xi) 

n , X K& 1=1 

/X ( X ) = — r , x^ s -l • 



n [cx s (u s )r 

ses 



n 

Dividing both sides by Yl fxt ( x i) we obtain 

i=l 
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r ( x II CX K (U K ) 

5F=T- ( 5 ) 



n /*(**) £ [cxs(us)] 
i=i 

Equations (j2J) and (jSJ) prove the statement of the theorem 



We saw that if the conditional independence structure underlying the 
random variables makes possible the construction of a junction tree, then 
the multivariate copula density associated to the joint probability distribu- 
tion can be expressed as a product and fraction of lower dimensional copula 
densities. 

A logical question is the following. What conditions are necessary for (jHJ) 
to be a copula density? It is easy to see that the product and fraction of 
copulas are positive. So 

n c k m 

c u = T 

U[c s MP" 1 
ses 

will be a copula density if and only if 

n c K (u K ) 

-cm = 1. 



n [o a (u s )] 

[0;i] n ses 



vs — l 



This happens if the following consistency conditions are fulfilled for all con- 
nected clique pairs Ki and Kf 



J c Ki {u Ki ) du Ki \ Sij = J c K . (u Kj ) du KASij , 

where Sij — Ki D Kj. We emphasize here that all cliques are subsets of the 
set T of the acyclic hypergraph (V, T, S). 

These conditions are fulfilled if c s (usj.) are marginal probability densi- 
ties of ck, (urtJ, whenever connects a cluster Ki. This can be expressed 
by terms of copula function as follows. 
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For {A4, . . . , k m } = Ki and {s u . . . , Si} = S ij7 {s x , . . . , s { } C {h, k m } 
stands C m (u kl , ...,u km ) = C t (u si , ...,u ai ) for u ki = 1, when k { <£ Sy. Usu- 
ally this condition is not fulfilled by copulas. 

Finding multivariate copulas which fulfill the consistency conditions is 
not a trivial task. 

A special type of conditional independence, when the graph underlying 
the random variables is starlike, is treated in [l9j. Another type of special 
multivariate copula where the underlying conditional independence graph is 
a tree can be found in jgj]. 

For discrete random variables, the conditional independences are ex- 
ploited by the Markov random fields. In physics for two-valued random 
variables it is known the Ising model. In these cases the random variables 
take on a few values only. However many times the problem is hard. The 
great advantage of using the discrete approach is that the marginal proba- 



bility distributions involved fulfill the consistency conditions ( see [17( and 

H). 



If we have an i.i.d. sample of size N from a continuous joint probabil- 
ity distribution then for each random variable we have iV different values. 
For this case, the empirical copulas were introduced and first studied by P. 
Deheuvels in jij who called them empirical dependence functions. Later in 



lOj and there were introduced the so called discrete copulas. About 
the two-dimensional empirical copulas one can read in Nelsen's introductory 
book (see 13||). In the case when we are dealing with a sample drawn from 



a continuous joint probability distribution the size of these random variables 
would be too large, so we will apply a uniform partition and define the so 
called sample derivated copula. 

3. The sample derivated copula. 

Let Xi, . . . ,X n be continuous random variables in the same probability 
field. Let 



/>"■ 1 ry 1 

Jy l) • • • j 
Jy l) • • • ) -^n 



N N 

1 • • • 1 **-v 



(6) 



'1 ) • • • ) -"n 

be an i.i.d. sample of size iV taken from the joint probability distribution of 
the random vector . . . , X n ) T . 
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As any sample element occurs two times in the sample with probability 
zero, we can suppose that the sample elements are different. 

We denote the set of the values of Xj in the sample by Aj. This set 
contains N values, for each random variable. The theoretical range of the 
continuous random variable Aj will be denoted by Aj. For every i we denote 
by A™ = minAj £ R and by Xf' 1 = maxAj £ R. We suppose for simplicity 
that min Aj 7^ min Aj and max Aj 7^ max Aj For each random variable Aj we 
define a partition of Aj by V% = {xq ! = A™,x? 1 , . . . , x^._ 1; x^. = Xf 1 } with 
the following properties: 

• For each random variable Aj, each interval (a^L^x^] , j = l,...,mj 

N 

contains a given rij = — £ N number of values from the set Aj. 

rrii 

• Each x v ? £ Aj, j = 1, . . . , mi — 1- 

The partition with the above properties will be called uniform partition. 
We denote by V the set of partitions {V\, . . . , V n }. 

Let be Aj the categorical random variable associated to the random vari- 
able Aj: 

P(Aj£ (xf_ 1 -xf]) = —,j = l,...,m l . 

y J J J rrii 

J 

We assign to each x % £ \x v ^_ 1 \x v f\ the number = — , j = 0, . . . ,mj. 

rrii 

Obviously u l Q = and u l m . = 1. Let Aj = {uj\j = 0, . . . ,mj}. So we can 
define the following discrete uniform random variables: 



Ui 



Uq u\ ... u l mi _i u l m . \ 

1 11 

V rrii rrii rrii J 



l,...,n. 



Now we transform the sample ([6]) using the above assignment. We denote 
the transformed sample by T . 



n 



Definition 4. The function c : Y[ Ai — > R defined by 



i=i 



#{«,■■■,<) er} 



will be called sample derivated copula distribution. 



,...,rrii 
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Remark 2. The maximum number of different vectors what the above de- 

n 

fined sample derivated copula can take on equals to Yl m i- 

i=l 

~ n 

Definition 5. The function : f] A i C [0; l] n [0; 1] defined by 

i=i 

iV 

will be called sample derivated copula. 

Throughout the paper we use the notation C n instead of 
Theorem 2. The sample derivated copula is a copula. 

Proof. 1) It is evident that C n is increasing in its each component. 

2) If exists s such that u' kt = then C n (u\ v u s k +^ = 

0. This follows directly from the definition. The sample do not contain 
any vector with a negative coordinate. 

3) If for all s ^ I we have u s kg = 1 then 



c n (i,...i,i4,i...,i) = 

# {(«!, . . . , u s , . . . u n ) G T \u s < 1, Vs I and m < u l h } 



1 fei-i 



ni ■ k 
N 



h_ 

mi 



N 



4) C n is n-increasing as it is a cumulative probability distribution function. 



Remark 3. The sample derivated copula differs, from the empirical copula 



5] and the discrete copula [10] , [H 



One of the differences is that the cardinal 
of Aj is not necessary the same for all i — 1, . . . , n. Another difference is that 
a marginal variable can take the same value in more than one vector (since 
vrti < N). 
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Theorem 3. The sample derivated copula has the following consistency prop- 
erty. If all variables u s ka = 1 for s G V\ {li, . . . , l q } then 

Proof. 

_ # {(mi,. ■ .,u n ) eT\u 8 <l,s e V\{h,...,l q },u h < u}.,i = l...q} _ 

N 

# {(tti, ■ ■ ■ , tt ra ) € T | M;- < = 1 . . .g} _ 

AT 

Remark 4. In general copulas do not fulfill the consistency property.. 

Remark 5. This theorem assures the consistency statements that we need 
when constructing junction tree like copulas. 

At the end of this part we convince the reader from an information theo- 
retical point of view why should one use the uniform partition and the sample 
derivated copula. 

In the following we suppose that each Aj, i — 1, . . . , n is partitioned in the 
same number of rrii, i — 1, . . . , n intervals as in the previous case. We denote 
now the partitioning points by y?', j — 0, 1, . . . , mf, i — 1, . . . , n. This par- 
tition is arbitrary (for example equidistant) which has not the property that 
each interval contains the same number of sample elements. The partition 
of Aj is given by [y^ \ j e {0, 1, . . . , m^}} and is denoted by V[. We denote 
by V the set of partitions V[, . . . , V' n . 

We denote the number of values of the variable X { belonging to (y^; y^ +1 ] H 
A, l.y /.•;. 

Let Yi be the categorical random variable associated to Xf 
where 

m i ui 
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The entropy of Xi determined by the partition V[ is: 

H vi (X i ) = H(Y^=-Y i ^] g^,i = l...n. 

3=1 

It can be seen that the entropy H-p^ (JQ) depends on the number of intervals 
rrii and on k*. 

We introduce the following notation: 

=P(X,e y^],X 2 e yg] , . . . , X n e yg] ) , 

where ji — 1, . . . , m^, % — 1, . . . , n. 

The joint probability distribution determined by the partition V has the 
joint entropy: 

mi m n 

H v> {Xu ...,*„) = -£"•£ » log 2 <fcf " 

jl=l jn = l 



The information content of the joint probability distribution determined by 
the partition V is: 

n 

Iv (Xi, X n ) = H<pi (Xi) — H v > (Xi, . . ., X n ) = 

i=l 

n u ( 7 ) 

= -EE^iog^+E---E " iog 2 

i=i j=i m 4 J1=1 Jn=1 

Remark 6. In this case the information content depends on the number of 
intervals rrii, the number of values in each one dimensional interval kp and 
the probabilities of belonging to the n-dimensional intervals. 

If we regard the uniform partition V then the entropy of X; L is: 

mi 1 1 

H Vi (Xi) = H (Xi) = - V — log — = log % = 1, . . . , n. 

V / rrii rrii 

j=l 1 1 

The entropy H-p. (Xi) depends just on the number of intervals rrii. 
We express now the probability 

/fef' =P(Xie a%],...,x n e (a£_ i; a£] ) = 

P (fii = Ujl ,...,U n = Uj}j = C(u h , U jn ) 
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The joint probability entropy associated to the partition is: 

mi m n 

H v {x u ...,x n ) = -£••• e p*:;:f r ' io g2 pfaf" = 

ji=i >i=i 

mi m n 

= - E • ' ' E C ■ ■ ■ > lo S2 C Oil, • • • > %n) • 

jl=l Jn = l 

The information content determined by the partition V is: 

n 

(Xi, . . . , x n ) = E (Xi) - ifp (Xi, . . . , x n ) = 

(8) 

n mi m n v ' 

= E lo S m i+ E E c(u jl ,...,U jn )l0g 2 C (lift,..., U jn ) 

i=l ft=\ j n =l 

Remark 7. If we suppose that for all i = 1, . . . , n the number of intervals 
rrii is the same for the two discussed cases then comparing formulas ([7]) and 
OH]) we can see that in the case of partition V the information content does 
not depend on the first sum of formula ([8]) but only on the sample derivated 
copula. 



4. The junction tree approach applied to the sample derivated cop- 
ula. 

We introduced the sample derivated copula as a discrete probability distri- 
bution with uniform marginals. We proved for this special copula in Theorem 
|3]that the consistency properties are fulfilled. 

Let V = {1, . . . , n} be again a set of vertices. Let be defined an acyclic hy- 
pergraph over V. We denote by T and S the set of clusters and separators of 
the hypergraph which determine a junction tree J . The marginal probability 
distributions associated to the clusters K = {z 1; . . . ,i t } G T are denoted by 

cr y^ K j = Ch,...,h ■ ■ ■ > ^itj ■ The marginal probability distributions as- 
sociated to the separators are denoted in the same way by cs {^s^j ■ The joint 
discrete copula is shortly denoted by c (\J^j and the univariate marginals by 

Ci (uA ,i = 1, ...,n . 

In this section we are going to use the following popular notation: 

mi m n 

E/(u) = E-E/(^=<—^=<)' 

u ii=l i„=l 
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where u^ k ,ik = l,...,m k are the possible values of the random variable 

Z7jfe, k — 1, . . . , n and / is an arbitrary n-dimensional function. This simplified 
notation is used for the products, too. 

Definition 6. The junction tree distribution given by 



where v$ is the number of clusters connected by the separator S, is called 
copula junction tree distribution, or shortly junction tree copula. 

The problem is finding the junction tree copula which fits to the sample 
derivated copula. The goodness of fitting will be quantified by the Kullback- 
Leibler divergence jij. 

Theorem 4. The Kullback-Leibler divergence between the approximation 
and the sample derivated copula c ( U ) is given by the formula: 






KL (cj(v),c(v 



) 



) 
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Proof. 



KL Icj U ,c U 



Ec u lo. 



c U 



c r U 



Ec U log 2 c U - Ec U log 2 c r U 



II c K [XJ K 



c s U 



Ecu 



n 

SeS 



iog 2 n pk- ( Ujs- ) - iog 2 n 

Ker v 7 ses 



us-i 



c s U 



-if U 



-#(u) -Ec(u)io g2 n c K (U^) + 

iter 



+Ec(u)io g2 n 

SeS 



c s U 



1 Vs—l 



We add and substract the sum: 

^c(u) io g2 nn^R- 



(10) 



It follows from the definition of the junction tree that [J K = V , and each 

Ker 

variable belongs once more in the clusters as in the separators. So by adding 
and substracting fflQ|) we obtain the following: 



n °k % 



KL (cj U ,c U ))=-H U -Ec (U lo 



cs U 



n nM^ 

iter ieA" 



+ Ec U lo 



n 

ses L 



vs—l 



n 



ses Lies 



--Ec(u)iog 2 nc 4 (f/, 
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c K [11 k 

-H[V)-^c[V) E log 2 ^ 

n oi (u 



ses 



v s -l 



v s - 



T -Ec (u Eiog 2 c, (u, 



i=i 



Since the sample derivated copula has the property that all c# yUx 
cs ( Us J , Q ( Ui J are consistent marginals of c ( U J (see Theorem |3]) we have 



the following relations: 



£c(U) E log 2 ^ 

n (U, 



E Ec* (Ujf) io g2 — = E / (u 

A-er u fc V / [] q t/J Aer 



' K 



Ec(u) Eiog 2 

ses 



cs U 



DC — 1 



n ^ ( ^ 



E E(^-i)c s (u 5 )iog 2 — ^ 
S65u s v y n ~ /jjA ses 



i&S 
n 



E (vs-i)i(u 



Ec(U)Elog 2 Q (Ui) =T,HlUi) =Elog 2 m,; 

i=l v 7 i=l v 7 i=l 



Here I [^kj ,1 \ ^s) are ^ ne information content of the probability distri- 
bution of the marginals c# J and Cg (^sj ( see 01 )• 
By the substitution of these assertions we obtain: 
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KL(cj(jj),c{v 



-H U 



E / - E (^-i)/(u 5 



i=l 



Remark 8. The difference E ^°&2 mi — H ( U ) does not depend on the junc- 

i=i 

tion tree structure. 



Definition 7. The difference 

is called the weight of the junction tree copula. 



It is easy to see that in order to find a better approximation using junction 
trees, wee have to find the junction tree having the largest weight. 

Finding the best fitting k- width junction tree, (the largest cluster contains 
k elements) for k > 2 is an NP-hard problem. For k = 2 the problem is similar 
to the Chow-Liu approximation j^]. In this case it is possible to find the best 
fitting second order junction tree by Kruskal' or Prim' algorithm. 

For k > 3 it can be successfully used a heuristic approach introduced by 
the authors in j3] and 14]. The idea is the fitting of a special kind of junction 
tree, called t-cherry junction tree. 



5. Conclusions and possible applications 

One of the advantages of the junction tree copula is that it reveals some 
of the conditional independences between the variables involved. This kind 
of dependence structure is not exploited by the copula function. Another 
advantage of the method is that a multivariate copula can be decomposed 
into some lower dimensional sample derivated copulas. 

The sample derivated copula approach is useful in cases when nothing 
else is known about the probability distribution but an iid sample. If the 
uniform partition is applied the whole information content depends on the 
sample derivated copula. 
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The copula junction tree can be used in feature selection which is a key- 
question in many fields as finance, medicine and biostatistics. 



We got very good numerical results in pattern recognition (see |l5|). First 
we applied the uniform partition to discretize continuous random variables 
then constructed the t-cherry junction tree approximation. In this way we 
found the informative features and so reduced the dimension of the classifier. 
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